Skip to content

feat: Add benchmarks#7

Merged
Pringled merged 15 commits into
mainfrom
semble-benchmarks
Apr 15, 2026
Merged

feat: Add benchmarks#7
Pringled merged 15 commits into
mainfrom
semble-benchmarks

Conversation

@Pringled
Copy link
Copy Markdown
Member

No description provided.

Pringled added 15 commits April 14, 2026 19:38
Add one-liner docstrings to all functions and methods across
benchmarks/common.py, run_benchmark.py, and sync_repos.py. Remove the D
ruff ignore for benchmarks/*.py so docstrings are enforced going forward.
Also moves count_indexed_targets into run_benchmark.py (where Chunk is
imported) to fix a pre-existing mypy Protocol error in the pre-commit env.
Full runs (no --repo/--language filters) automatically write results to
benchmarks/results/<sha>.json, keyed by the 12-char git SHA. The file
includes the full SHA, model name, per-repo rows, language aggregates,
and overall summary. Cache mode writes <sha>-cache.json. Filtered runs
are not saved.
Drop the --cache mode (cold vs warm build timing) — it was noisy and
not actionable. Instead, add index_ms to RepoResult so every full run
records index build time per repo alongside NDCG and query latency.
index_ms is included in the saved JSON and printed in the summary table.
…, inline _output

- benchmarks/common.py -> benchmarks/data.py (more descriptive name)
- BENCH_ROOT: /tmp/bench -> ~/.cache/semble-bench (survives reboots)
- Inline _output into _check_repo (single call site)
- Update README to drop --cache docs and reflect new paths
@Pringled Pringled merged commit d2dbdd2 into main Apr 15, 2026
Pringled added a commit that referenced this pull request Apr 17, 2026
… annotation audit

- Fix n_relevant to use annotation count instead of index coverage (reviewer #5)
- Add per-category NDCG@10 to printed summary and saved JSON (reviewer #7)
- Replace 11 trivially-lexical semantic queries with vocabulary-diverse alternatives
- Baseline: NDCG@10 = 0.825 (architecture=0.773, semantic=0.823, symbol=0.943)
@Pringled Pringled deleted the semble-benchmarks branch April 22, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant